Forward Backward Similarity Search in Knowledge Networks

نویسندگان

  • Baoxu Shi
  • Lin Yang
  • Tim Weninger
چکیده

Similarity search is a fundamental problem in social and knowledge networks like GitHub, DBLP, Wikipedia, etc. Existing network similarity measures are limited because they only consider similarity from the perspective of the query node. However, due to the complicated topology of real-world networks, ignoring the preferences of target nodes often results in odd or unintuitive performance. In this work, we propose a dual perspective similarity metric called Forward Backward Similarity (FBS) that efficiently computes topological similarity from the perspective of both the query node and the perspective of candidate nodes. The effectiveness of our method is evaluated by traditional quantitative ranking metrics and large-scale human judgement on four large real world networks. The proposed method matches human preference and outperforms other similarity search algorithms on community overlap and link prediction. Finally, we demonstrate top-5 rankings for five famous researchers on an academic collaboration network to illustrate how our approach captures semantics more intuitively than other approaches. Computing the similarity of two or more objects in an information network is the main focus of a large amount of scientific research and technological development. Friendship recommendation in social networks is one example, but web search, community detection, general link prediction, list augmentation, and dozens of other application areas are all singularly dependent upon some notion of similarly in the underlying networks. Similarity is multi-faceted; various traits can be used to determine similarity depending on the specific problem domain. Entire fields of research are dedicated to the development of algorithms that effectively and efficiently retrieve objects similar to some query-object, e.g., information retrieval, computer vision, and databases (broadly speaking). Researchers and practitioners understand that network topology plays a critical role in the identification of object similarity [26, 27, 35]. An appreciation of the topological features has led to the development of models of network growth, clustering, prediction, and classification. Given a query vertex u, what we need is a network similarity metric that finds a target vertex v to be similar if they satisfy the following criteria: 1. u is highly connected to v, and 2. v is highly connected to u A typical approach used to compute personalized search is to measure the similarity between some query node and a set of candidate target nodes (maybe all other nodes). After the similarities of the candidate nodes have been found, the user is typically presented with a top-K list of candidate nodes ordered by their similarity scores. For example, in citation networks Case et al. had previously defined six citation behaviors [5], which we simplify into two categories: a) intra-domain citations and b) cross-domain citations. Intra-domain references often include related prior work that is directly related to the referencing paper, and are the type of references that a reader would expect to see included in the experimental comparison section of the referencing paper. On the other hand, cross-domain citations often represent paradigms, platforms, and data sets that come from a separate, loosely-related area. For example, the closely related references of this paper include references to personal PageRank [11], SimRank [12] and personal SALSA [3]; while the loosely related references of this paper include references to DBLP [21], and ArnetMiner [48] datasets, or the reference to the Spark system [53] 1 ar X iv :1 61 1. 09 31 6v 1 [ cs .S I] 2 8 N ov 2 01 6

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Flexible Integrated Forward/ Reverse Logistics Model with Random Path-based Memetic Algorithm

Due to business and environmental issues, the efficient design of an integrated forward/reverse logistics network has recently attracted more attention from researchers. The significance of transportation cost and customer satisfaction spurs an interest in developing a flexible network design model with different delivery paths. This paper proposes a flexible mixed-integer programming model to ...

متن کامل

Hierarchical Phrase Alignment Harmonized with Parsing

In this paper, we propose a hierarchical phrase alignment method that aims to acquire translation knowledge. Previous methods utilize the correspondence of sub-trees between bilingual parsing trees after determining the parsing result. The method described in this paper combines partial tree candidates, and selects the best sequence of partial trees. Then, a structural similarity measure (calle...

متن کامل

Backwards State-space Reduction for Planning in Dynamic Knowledge Bases

In this paper we address the problem of planning in rich domains, where knowledge representation is a key aspect for managing the complexity and size of the planning domain. We follow the approach of Description Logic (DL) based Dynamic Knowledge Bases, where a state of the world is represented concisely by a (possibly changing) ABox and a (fixed) TBox containing the axioms, and actions that al...

متن کامل

Forward-Backward Building Blocks for Evolving Neural Networks with Intrinsic Learning Behaviors

This paper describes the forward-backward module: a simple building block that allows the evolution of neural networks with intrinsic supervised learning ability. This expands the range of networks that can be eeciently evolved compared to previous approaches, and also enables the networks to be invertible i.e. once a network has been evolved for a given problem domain, and trained on a particu...

متن کامل

MusicBLAST - Gapped Sequence Alignment for MIR

We propose an algorithm, MusicBLAST, for approximate pattern search/matching on symbolic musical data. MusicBLAST is based on the BLAST algorithm, one of the most commonly used algorithms for similarity search on biological sequence data [1, 2]. MusicBLAST can be used in combination with an arbitrary similarity measure (e.g., melodic, rhythmic or combined) and retrieves multiple occurrences of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 119  شماره 

صفحات  -

تاریخ انتشار 2017